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0\ Abstract 

This paper discusses a novel fully implicit formulation for a one-dimensional electrostatic 

r-| particle-in-cell (PIC) plasma simulation approach. Unlike earlier imphcit electrostatic PIC 

Oh approaches (which are based on a linearized Vlasov-Poisson formulation), ours is based on 

CLi a nonlinearly converged Vlasov-Ampere (VA) model. By iterating particles and fields to a 

^ tight nonlinear convergence tolerance, the approach features superior stability and accuracy 

2 properties, avoiding most of the accuracy pitfalls in earlier implicit PIC implementations. In 

^ particular, the formulation is stable against temporal (Courant-Priedrichs-Lewy) and spatial 

O (aliasing) instabilities. It is charge- and energy-conserving to numerical round-off for ar- 

^ bitrary implicit time steps (unlike the earlier "energy-conserving" explicit PIC formulation, 

^ (-| which only conserves energy in the limit of arbitrarily small time steps). While momentum is 

^ Qh not exactly conserved, errors are kept small by an adaptive particle sub-stepping orbit inte- 

grator, which is instrumental to prevent particle tunneling (a deleterious effect for long-term 
^ accuracy). The VA model is orbit-averaged along particle orbits to enforce an energy con- 

servation theorem with particle sub-stepping. As a result, very large time steps, constrained 
O only by the dynamical time scale of interest, are possible without accuracy loss. Algorith- 

(•T) mically, the approach features a Jacobian-free Newton-Krylov solver. A main development 

^ in this study is the nonlinear elimination of the new-time particle variables (positions and 

O velocities). Such nonlinear elimination, which we term particle enslavement, results in a non- 

^ linear formulation with memory requirements comparable to those of a fluid computation, 

and affords us substantial freedom in regards to the particle orbit integrator. Numerical 
• '-j examples are presented that demonstrate the advertised properties of the scheme. In partic- 

ular, long-time ion acoustic wave simulations show that numerical accuracy does not degrade 
<^ even with very large implicit time steps, and that significant CPU gains are possible. 



1. Introduction 

The evolution of coUisionless plasmas in the presence of electromagnetic fields is well de- 
scribed by the Vlasov- Maxwell set of equations. The Vlasov equation describes the evolution 
of the probabihty distribution functions (PDFs) for one or more species, while electromag- 
netic fields evolve according to Maxwell's equations. The field-PDF description is tightly 
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coupled. Maxwell's equations (or a subset thereof) are driven by moments of the PDF such 
as charge density and/or current density. The PDF, on the other hand, follows a hyperbolic 
equation in phase space, whose characteristics are self-consistently determined by the fields. 

The numerical solution of such strongly coupled systems has proved challenging. Here, 
we concern ourselves with particle-in-cell (PIC) techniques |1]-|3], which integrate the Vlasov- 
Maxwell equations in time by combining a mesh (to represent fields) with particles (to follow 
characteristics of the PDF in phase space). The PIC approach has been very successful, 
enabling many first-principle kinetic calculations of plasma dynamics since the early years 
of computer simulations (e.g. [,4j). Key aspects of the method involve the definition of 
interpolation operations between mesh and particle quantities, and the coupled temporal 
integration of field and particle equations. 

The most common PIC implementation employs an explicit time integration of the field 
and particle equations [Ij (e.g., the full-/ explicit momentum-conserving method). Explicit 
approaches provide a straightforward recipe for temporal integration. However, standard ex- 
plicit PIC approaches suffer from temporal numerical stability constraints (the well-known 
Courant-Friedrichs-Lewy or CFL time step limit). Furthermore, some explicit PIC imple- 
mentations also feature spatial stability constraints (e.g., the momentum-conserving explicit 
Vlasov-Poisson PIC formulation, which requires the Debye length to be resolved in order to 
avoid so-called finite-grid instabilities) [Ij. As a result of these stability constraints, explicit 
PIC becomes very demanding computationally when applied to multidimensional configura- 
tions in the presence of general electromagnetic fields |5]. 

Implicit methods, however, can free the PIC approach from numerical stability con- 
straints, and thus have the potential of much improved algorithmic efficiency. This real- 
ization drove the exploration of implicit PIC starting in the 1980s p-120]. These studies 
explored the viability of an implicit PIC formulation and its accuracy properties, and re- 
sulted in important developments such as the implicit-moment method |6H9[ [HI |20] and the 
direct-implicit method [TIH [T21 HSl [IZl US]- However, limitations of the solver technology at 
the time forced early implicit PIC practitioners to rely on approximations such as lineariza- 
tion and lagging, which did not respect the strong field-particle coupling. These numerical 
approximations produced energy conservation errors that could result in significant plasma 
self-heating or self-cooling |21) . 

In this study, we explore a fully implicit PIC solver, based on Newton-Krylov meth- 
ods [22j, in which field-particle couplings are converged to a tight nonlinear tolerance. For 
this proof-of-principle study, we employ a one-dimensional (ID) electrostatic Vlasov- Ampere 
(VA) model. This study builds on earlier studies for the ID Vlasov-Poisson (VP) [23j and 
VA [21] equations. The emphasis here is on both accuracy and efficiency. By iterating parti- 
cles and fields to a tight convergence tolerance, the approach features superior stability and 
accuracy properties, avoiding most of the accuracy pitfalls in earlier implicit PIC implemen- 
tations. In particular, we will show that the formulation is stable against both temporal and 
spatial instabilities. The formulation is charge- and energy-conserving to numerical round- 
off for arbitrary implicit time steps. This is unlike the explicit "energy-conserving" PIC 
formulation of Lewis |25], which conserves energy only in the limit of arbitrarily small time 
steps. Automatic exact charge conservation[2B] is achieved by ensuring that each particle 
moves within a given cell. Momentum is not exactly conserved in our approach, but errors 
are kept small by consistent spatial smoothing and an adaptive particle sub-stepping orbit 
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integrator, which is instrumental to prevent particle tunneling (a deleterious effect for long- 
term accuracy). The time-centered VA equations are orbit-averaged along the sub-stepped 
particle orbits [27J to enable an exact energy conservation theorem. As a result, very large 
time steps, constrained only by the collective dynamical time scale of interest, are possible 
for the field solver without significant accuracy loss. 

Algorithmically, the approach features an unpreconditioned Jacobian-free Newton- Krylov 
(JFNK) solver. A main algorithmic contribution of this study is the nonlinear elimination 
of the new-time particle variables (positions and velocities), which we term particle enslave- 
ment. This results in a two-fold advantage. On one hand, the nonlinear formulation features 
computer memory requirements comparable to those of a fluid computation. On the other 
hand, it affords us substantial freedom in regards to the particle orbit integrator, enabling 
the implementation of an adaptive charge- conserving mover, as described earlier. 

Numerical examples are presented that demonstrate the advertised properties of the 
scheme. In particular, we have performed multiscale ion acoustic wave (lAW) simulations. 
The numerical results show that an accurate orbit integration is key for long-term accuracy. 
Furthermore, our results demonstrate that a standard explicit VP momentum-conserving 
PIC solver requires a timestep much smaller than the explicit CFL to provide comparable 
accuracy to the fully implicit PIC algorithm employing a much larger time step. Efficiency- 
wise, we argue that large CPU gains are possible, especially in multiple dimensions, when 
the system size is much larger than the Debye length. We demonstrate moderate CPU gains 
numerically with the ID lAW problem with our unpreconditioned Newton-Krylov solver, 
underscoring the algorithmic potential of the approach. 

The rest of the paper is organized as follows. Section [2] describes the Vlasov- Ampere 
model in the continuum and its application to ID electrostatic plasma simulation. Section 
[3] describes the details of our discrete Vlasov-Ampere implementation, with emphasis on 
energy and charge conservation, as well as our adaptive particle mover. Section |4] describes 
the concept of particle enslavement, and provides details of our JFNK solver implementation. 
Section [5] demonstrates the advertised accuracy and efficiency properties of the scheme with 
standard electrostatic problems such as Landau damping, two-stream instability, and the 
ion acoustic wave. Finally, we conclude in section [6| 

2. Electrostatic Vlasov-Ampere model 

A coUisionless electrostatic plasma is described by the Vlasov-Poisson equations: 

0, (1) 

^ (2) 
-V0, (3) 

where /^(r, v) is the particle distribution function of species a in phase space, qa and ma are 
the species charge and mass respectively, and E are the self-consistent electric potential 
and field respectively, p(r) = J2a1al d'vfai''^,^) is the charge density, and is the vacuum 
permittivity. This system of equations is commonly used for modeling the behavior of an 
electrostatic plasma in one or more dimensions pj. An alternate formulation can be written 
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using Ampere's law, which is derived in the electrostatic limit as follows. We start with the 
charge continuity equation: 

|.V,^0, ,4) 

where j = / wdvfa is the plasma current density. We substitute Eq. Q in Eq. Q to 

have 

(5) 

In ID, one finds the integral of Eq.(|5| exactly as 

eo^+J = Cit) (6) 

where C{t) is an arbitrary function of time. By integrating Eq.([6]) over the periodic domain, 
one finds C{t) = (j), where (j) = J jdx/ J dx is the spatial average of the current density, 
and hence Eq.(|6]) becomes: 

eo^+J = (j). (7) 

The right hand side of Eq.Q is a solvability condition for the ID electrostatic Ampere 
equation. It is also required to preserve Galilean invariance of the system. 

In the continuum, the VP and VA formulations are equivalent. In the discrete, however, 
they have different properties. Generally, VP is momentum and charge conserving, while VA 
can be energy and charge conserving, as we will show. We will also show that the discrete 
VA and VP can be equivalent under special conditions [see Sec. (3.6)]. In what follows, we 
consider the VA formulation, as it generally has better stability properties |27| . 

The VA model can be extended to multi-dimensions by replacing j — (j) with the longi- 
tudinal piece of the current, j -|- V x (V~^(V x j), where is the inverse Laplace operator. 
The latter expression ensures that the electric field obtained from Ampere's law is conserva- 
tive [i.e., that Eq.([3]) is enforced]. Alternatively, one can derive an evolution equation for the 
electric field (or, equivalently, the electrostatic potential 0) by using the charge continuity 
equation Eq.Q, Gauss' law Eq.([2]), and the electrostatic approximation Eq.([3]) to find: 

eo^-V-j = 0, 

with E found from Eq.([3]). This equation also requires inverting a Laplace operator to find 



3. Implicit particle-based discretization of the VA model 

We begin to develop a suitable implicit discretization of the VA equations by considering 
a time-centered [Crank-Nicolson (CN)] formulation, written as 

7-in+l rpn 

^o^^^+j'r^' = {jr'^\ (8) 



— = (9) 

n+l 



At 
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where subscripts i and p denote grid points and particles respectively, superscript n denotes 

, • , .71+1/2 

time step, 



n+l/2N n+1/2 



YAx is the time-centered plasma current den- 
){E^^^ + -E'j")/2 is the time-centered electric field, and 



71+1/2 w ™ + i 



sity, E{x;^'/') 

the half time step particle position and velocity are defined by x 

n+l/2 
Jp 



n+1/2 
P 



(Xl 



+ x1+^)/2 and 

"^p ' — K^p +'^p^^)/2) respectively. The CN method is implicit (a particle's future position 
and velocity depend on the future acceleration), symplectic, second order accurate in time, 
and unconditionally stable. Most importantly, CN is non-dissipative, which enables an exact 
energy conservation theorem. Next, we proceed to show that the time centered VA equations 



8P-(10) admit a discrete energy conservation theorem. 



3.1. Exact energy conservation theorem 

Multiplying Eq.(lO) by Vp^^^"^ and summing over all particles, we find that 



E + <) - <) = E vr'%E{x;^'/^)At 



introducing def. ofE(x"+^/^) 



pn+l I rpn 

E ^tv;^'% E , ' Six. - 



commuting sums — )■ 



rpn+1 I 7-171 



using Ampere's law ^ = E f^t - eo - E^) 



Er+i + E? 



Ax 



E^r^"'^ = o^ = -E? (K^')'-(K 



Ax 



where the last step is justified because is a gradient of and we are considering a periodic 
domain. We therefore conclude that 



1 n+l 



EAx 



i.e., the electrostatic system (|8j)-(10) features an exact energy conservation theorem, whereby 
the total energy [defined as the sum of kinetic energy (Yip ^^'^p) ^^"^ ^^e electric field energy 
(Yli ^^oEf)] is exactly conserved from time step n to time step n+l. We emphasize that 
this result is enabled by a VA formulation that uses (a) a CN time discretization and (b) 
identical current assignment and force interpolation shape functions. It is important to note 
that exact conservation of energy requires the implicit field and particle equations to be 
updated in a nonlinearly consistent manner every time step. This is distinctly different from 
previous implicit schemes |T6| |28| [29] , which do not feature nonlinear consistency. It is also 
different from the "energy-conserving" scheme developed by Lewis [25], which is explicit and 
does not conserves energy exactly with finite At. 

The benefit of the existence of an energy conservation theorem is that finite-grid insta- 
bilities, commonly seen in explicit PIC simulations when Ax > A^, are eliminated. This 
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property has the potential of significant computational savings of implicit PIC vs. explicit 
PIC, as many fewer grid points (and consequently fewer particles) will be required for the 
former in regimes where kXi) <^ 1. Additionally, time steps much larger than AtcFL ~ ^^pe 
are possible without numerical instabilities or numerical heating or cooling. However, sta- 
bility is not the only requirement, as accuracy also needs to be ensured, particularly when 
employing large time steps. In the next section, we discuss the issue of accuracy in our 
implicit PIC implementation, and our approach to mitigate potential numerical errors. 

3.2. Particle sub-stepping and orbit- averaging 

In multiple time-scale problems, u ^ Upe, and a large time step is desired. However, 
as pointed out by Langdon |28l l30] . large inaccuracies in particle orbits may result when a 
large time step is employed, such that a particle moves a distance much greater than a cell 
size. The accuracy impact of orbit errors can be clearly seen by computing the dielectric 
constant of a homogeneous plasma [30]. For instance, in the intermediate frequency regime 
Upi ^ a; ^ Upe, the electron response in the dielectric constant is expected from theory to 
be 

ethcory = 1 + {kX^y'^ + (ion response). 

With a CN discretization of particle orbits, however, the numerical response when kvth^t ^ 
1 is ^ 

ecN — 1 + -(wpeAt)^ + (ion response), 

which is much larger than etheory in the limit considered. Thus, accuracy requires that 

Ax , , 

At < — . 12 

Vth 

This is an undesirable constraint, especially for multiscale problems for which the time-scale 



of the slow dynamics is much larger than that determined by Eq.(12). 

In this study, we address the issue of orbit accuracy by considering particle sub-stepping 
where different time steps for orbit and field integration are employed: small steps (Ar such 
that kmaxVth^T < 1 with kmax = vt/Ax) are used to ensure the accuracy of the particle 
orbits and the collective response; meanwhile, a larger time step (At > Ar) is used in the 
field solver to follow the low-frequency dynamics. This approach allows slow particles to 
be integrated quickly and fast particles to be integrated accurately. We will discuss the 
numerical implementation of particle sub-stepping in the overall algorithm in Sec.(|4j). 

In principle, particle sub-stepping breaks the energy conservation theorem. However, 
energy conservation can be recovered by considering an appropriate orbit-averaged plasma 
current density to advance E in Eq.([8]). The appropriate time-centered, orbit averaged 
current density is defined as 

^aL-X^ *^<"- - (13) 

u=l p 

where u is the a sub-step index, Ni, is the number of sub-steps, and At = ^t^- Each 

sub-step is a CN move with time step At'^, which can change from one sub-step to another: 

Ar- ^ ' ^ ' 
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with E{xp'^^'^) = T.iS{xi - + E^)/2. Note that the E-field used to move 

the particles is averaged over the macro-steps {n and n + 1), which is consistent with the 



assumption that the electric field evolves in a slower time scale. Using j 
energy conservation theorem is recovered as follows: 



r" m Eq.(8 



the 



p p u 



p 

p V 

n+1 



77"*+-'- _L 7?'* 

= E E ^-'<"'% E - 

p V i 
i 

= J2^'o[iEr'r-iErr]. (le) 

i 

3.3. Charge conservation 

Standard methods of current and charge assignment in VA do not satisfy the continuity 
equation Q [1]. Accordingly, charge is not conserved locally, and Gauss's law is violated. 
Methods for automatic, exact charge conservation for PIC simulations have been developed 
|26l I3TH33] . In general, the charge-conserving schemes are valid only within a given cell, 
and break when a particle crosses a cell boundary. The problem is usually resolved [26] by 
splitting current contributions between adjacent cells, as follows: 

jp = 3 pi + Jp25 

where jp = qpAxp/At, jpi = qpAxpi/ At, jp2 = qpAxp^l At, Axp\ and Axp2 are segments of 
the particle displacement within a cell, and Axp\ -|- Axp2 = Axp. By doing so, charge is 
strictly conserved. However, in our context, this approach would break energy conservation, 
because it requires the time-centered, orbit-averaged current density shown in the preceding 
section. Here, we pursue an alternative way of assigning charge-conserving currents, which 
is to actually make the particle stop at each cell boundary. It can be readily shown (see 



Appendix Appendix A) that this prescription leads to exact charge conservation for each 



particle sub-step. The property also holds in an orbit-averaged sense. This can be readily 



seen by taking the orbit-average Ylu=i ^^'^] of the continuity equation for each sub-step 
(z/ ^ Z/ + 1): 

n'^+l _ n'' ■u+1/2 .1^+1/2 

Ar^ Ax ' 
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and using Eq.^ and that (p'l^y^ - = 'EuiPiti/2 " Pi+1/2)' find: 

„n+l _ „n -n+1/2 -n+1/2 

/^»+l/2 Pi+1/2 3i+i - 3^ ^ Q ,^g^ 

At Aa; ■ ^ ^ 

This is the continuity equation in terms of the orbit-averaged plasma current. It follows 
that, once appropriate particle sub-stepping and orbit-averaging schemes are used, our VA 
implementation simultaneously conserves energy and charge exactly. Unlike the exact energy 
conservation, charge conservation is ensured regardless of nonlinear convergence tolerance. 
This is so because the property is built into the particle mover. 

3.J^. Particle tunneling and adaptive orbit integration 

The energy- and change- conserving particle mover does not enforce exact momentum 
conservation. We have found that momentum conservation errors are exacerbated by particle 
tunneling. This occurs when a particle tunnels through a potential energy barrier, which is 
statistically unavoidable with a mover employing a fixed time step (see FigjlJ. The orbit 
error caused by particle tunneling affects momentum conservation, because the particle ends 
up traveling in the wrong direction. While only a few particles experience tunneling, we 
have found its accuracy impact to be deleterious [This will be demonstrated numerically in 
Sec.Q]. 

Particle tunneling is avoided by our charge-conserving particle mover, as particles are 
forced to stop at cell boundaries. To further improve the accuracy of our orbit integrator, 
we have designed an adaptive orbit-integration algorithm. We wish to control the sub-time 



step At of a particle such that the local truncation error of Eqs.(14) and (15) is below a 
specified tolerance. This leads to the condition: 

\\le{Ar)\\,<ea + ery{Ar)\\^, (19) 
where ||-|| 2 denotes the L2-norm of enclosed vector, /e(Ar) = {a^, (^^V pj } is a mea sure 



of the local truncation error of the discrete orbit equations (see Appendix Appendix B), 
and Er are absolute and relative tolerances, respectively, and r°(Ar) = {v^,a^}AT is the 
initial residual. The key to the effectiveness of the approach is in the estimation of dap/dx 
in /e(Ar). Rather than computing a local estimate based on the current particle position, 
we compute a cell-based estimate dap/dx ^ ^{Ei^i —Ei)/Ax (where we have assumed that 
the particle is located between grid points i and i + 1 at time level u), and analytically 
continuate this estimate beyond the cell boundaries. This provides a numerical potential 
barrier for trapped particles against tunneling, as illustrated in FigjlJ With this estimate. 



Eq.(19) results in a quadratic equation for an upper limit for At. As before, particles stop 



at cell boundaries to enforce charge conservation. 
3. 5. Space filtering 

In the conventional momentum-conserving scheme, finite-grid instabilities result in large 
energy conservation errors [1]. In energy-conserving schemes, aliasing errors result in the 
loss of momentum conservation. The aliasing errors arise as the wavelengths of the particle 
moments, which are sampled in continuous space, can be much shorter than those of moments 
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Figure 1: Illustration of particle tunneling through a potential barrier. With charge-conserving, adaptive 
stepping, a trapped particle is not allowed to tunnel through the potential barrier. 

defined on the mesh. Because of the finite number of grid points, mode perturbations 
with fcAa; > vr contribute to the modes with klS.x < ix. The particle shape function can 
be viewed as a low pass filter to reduce aliasing errors: a B-spline of order m has the 
form of Sm{k) = (sin |fcAx/^A;Ax)™, which depresses short-wave-length signals. One can 
achieve further filtering by increasing m, but this has disadvantages (for instance, it results 
in extended stencils which complicate a parallel implementation). Alternatively, one can 
apply space filtering of grid quantities, which is recommended to further reduce noise signals 
with kAx — )■ vr [34j. One simple example is the cos^ (^^) (binomial) filter in Fourier space. 
In real space, it is equivalent to replacing a grid quantity Qi with a binomial operator SM 
such that 

SM{Q), = + '^f + . (20) 

This is the approach followed in this study. Implementation-wise, two extra operations are 
introduced in the implicit algorithm. Firstly, the binomial operator is used on the electric 
field on the mesh, which is then interpolated to the particles. The corresponding particle 
motion per sub-step is given by 



V, 



Ar'^ 



Ar^ 



v+l/2 



(21) 
(22) 
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Secondly, the binomial operator is used on the orbit-averaged current density in Ampere's 
equation: 

^0^^^^ + SMQ)r" = Hr"' . (23) 

Note that the smoothing does not change the spacial average of j on the right-hand-side. It 
is important to note that both the energy and charge conservation properties still hold with 
binomial smoothing: one simply needs to realize that, in a periodic domain, YlijiSMlE)^ = 
Y^,- SM{j)iEi, which allows the proof in Eq.(16) and Eq.(17)-(18) to follow through. 



3.6. Equivalence of VA and VP 

From the discussion above, it is clear that both Ampere's equation and the charge conti- 
nuity equation are strictly satisfied. It is therefore easy to deduce that Gauss's law, if satisfied 
initially, is also strictly satisfied at all times. This follows trivially from the following: 



n + 1 _ „ -n + 1/2 -71+1/2 > =^ T Pt + 1/2 — 7 Pi + 1/2 

At ~^ Ax ~ ^ ■> ' 

The discrete equivalence of the VA system and VP system is thus evident. Consequently, 
the implicit energy- and charge-conserving scheme proposed here can also be viewed as 
an energy-conserving implicit VP formulation, and therefore shares a strong connection 
to Lewis's explicit energy-conserving scheme[25j. Both methods solve the same system of 
equations and feature exact charge conservation. Both schemes define E and p half grid 
apart, and the interpolation schemes are identical, namely, the shape function (B-spline) of 
E is one order lower than that of p. 

There are, however, important differences between the two energy- conserving methods. 
While Lewis's is explicit and energy-conserving only when At — 0, ours is implicit and 
exactly energy-conserving for arbitrary At. Moreover, unlike explicit PIC schemes (including 
Lewis's), in which the time step is fixed and usually the same for all particle and field 
equations, our implicit approach allows different time steps to be used for particles vs. fields. 
This enables complete flexibility in the orbit integration stage, which allows momentum error 



control [see Sec. (3.4)]. 



4. Formulation of nonlinear residual: nonlinear elimination/particle enslavement 



The goal of this study is to find the nonlinear root of the system of equations (21)-(23). 
This nonlinearly coupled set of equations can be conceptually formulated as a nonlinear 
residual of the form: 

F(E,^) = 0. (24) 

Here, E = \Ei\ with i representing mesh points, and ^ = {a;p,fp} with p representing 
particles. Upon convergence, E = {E'"'*'^}, and ^ = {x^"^^, f^"^^}, i.e., the new-time field 
and particle quantities. In formulating this residual, we have naively taken both particle 
and field variables as dependent variables. While in principle correct, this formulation has 
one fundamental limitation, namely, that the residual has inherited the full dimensionality 
of the kinetic problem. For any iterative solution method, this will result in exceedingly 
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large storage requirements. Given that memory is the principal bottleneck in current and 



future massively parallel computers, pursuing the formulation in Eq.(24) does not appear 
promising. 

There is, however, an alternative, which is to nonlinearly eliminate the kinetic component 
to the low-dimensional description in terms of fields and moments. The concept of nonlinear 
elimination is akin to the well-known elimination process in linear equations, and can be 
explained as follows. Let us consider a nonlinear residual F(Xi,X2), with Xi, X2 two sets 
of dependent variables. Let us also divide the global residual into two sets of equations, 
Fi(Xi,X2) = and F2(Xi,X2) = 0, and assume that F2(Xi,X2) can be straightforwardly 
written in an explicit form as X2 = f2(Xi). It follows that a new residual can be formulated 
as: 

Fi(Xi,f2(Xi)) = G(Xi) = 0. (25) 

By construction, the new residual G has lower dimensionality than the original one F (and 
thus requires less storage for the Krylov and Newton iterations), but has the same nonlinear 
solution. One successful example of the use of nonlinear elimination to significantly reduce 
the dimensionality of the nonlinear residual in a nontrivial application (the solution of the 
Fokker-Planck transport equation) is reported in Ref. [35j . 



In our context, the key point is that the particle equations of motion (21) and (22) can 
be expressed as X2 = f2(Xi) if one considers Xi = E and X2 = That is, for each particle 
and for a given E on the mesh, one can explicitly solve Newton's equations for the new- 
time particle position and velocity, [x'^'^^ , v'!^^^) . This requires a local nonlinear solve, as the 
particle position affects the interpolation of the electric field. However the particle equations 
of motion are not stiff (we employ substepping), and we have found that Picard's method 
of successive approximations (with a nonlinear tolerance of 10"^*^) is sufficient to solve the 
particle equations of motion effectively in each sub-timestep. 



As a result, the original residual Eq.(24) can be formally rewritten as a function of E 
only: 

G(E) = 0. (26) 

This is what we term particle enslavement ( particle quantities do not appear explicitly in the 
residual). It shows that the full kinetic description can be equivalently formulated in terms 
of a low- dimensional residual, with solver memory requirements (e.g., nonlinear residuals, 
Krylov subspace vectors, etc.) comparable to those of a moment/fluid description. (It should 
be noted that one still needs to save old-time and new-time particle quantities as auxiliary 
variables to compute the nonlinear residual.) The concept is not restricted to electrostatic 
PIC, and can be trivially generalized to an electromagnetic formulation with Xi = {Ei, Bi} 
and X2 as above. Note that particle enslavement allows complete flexibility in the particle 
integration step, X2 = f2(Xi). Since this operation is segregated in the calculation of the 
residual G, it is ideally suited to exploit heterogeneous computing architectures such as 
general-purpose graphics processing units (GPGPUs). This will be explored in future work. 
Finally, we note that this flexibility has enabled us to explore the advanced features of our 
orbit integrator, namely, particle sub-stepping, orbit averaging, temporal adaptivity, and 
charge conservation. 



Finding the root of the enslaved residual Eq.(26) is not a simple task, because nonlin- 



ear couplings are nonlocal on the mesh, and are very nontrivial due to the complex orbit 
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integration/averaging step. Jacobian-free Newton-Krylov (JFNK) methods present impor- 
tant advantages for this apphcation, which we discuss briefly below. The motivated reader 
can find extensive discussions on this approach elsewhere |22] . Newton's method solves the 
nonlinear system G(x) = (with x = E) iteratively by inverting linear systems of the form: 



dG 



(fc) 

(5x(^-) = -G(x('=)), 



with x*^'^"'^^'' = x^^) + Sx^^\ where (k) denotes the nonlinear iteration number. Nonlinear 
convergence is achieved when: 

||G(x('=))||2<e, + e,||G(x(°))||2 = ei, (27) 

where ||-||2 is the L2-norm, = x 10^^^ (with the total number of degrees of free- 
dom) is an absolute tolerance to avoid converging below round-off, is the Newton relative 
convergence tolerance (set to 10^® in this work, unless otherwise specified), and G(x('')) is 
the initial residual. 

Such linear systems are solved iteratively with Krylov methods, which only require 
matrix-vector products to proceed. Because the linear system matrix is a Jacobian matrix, 
such matrix-vector products can be implemented Jacobian-free using the Gateaux derivative: 



dG 



, G(xW + ev)-G(xW) , , 

v = lim^ ^ ^ ^, (28) 



where in practice a small but finite e is employed (p. 79 in ^22]). Thus, the evaluation of 
the Jacobian- vector product only requires the function evaluation G(x(^^ +ev), and there is 
no need to form or store the Jacobian matrix. This, in turn, allows for a memory-efficient 
implementation. 

An inexact Newton method [3Bj is used to adjust the convergence tolerance of the Krylov 
method at every Newton iteration according to the size of the current Newton residual, as 
follows: 

pWs^ik) + G(x('=))||2 < C^"^^ ||G(x('))||2 (29) 

where (^^''^ is the inexact Newton parameter and J^^-* = ^ | is the Jacobian matrix. Thus, 
the convergence tolerance of the Krylov method is loose when the Newton state vector x^'^^ 
is far from the nonlinear solution, and tightens as x*^^^ approaches the solution. Superlinear 
convergence rates of the inexact Newton method are possible if the sequence of (^'^^ is chosen 
properly (p. 105 in [22] )• Here, we employ the prescription: 



^A(fc) 



lG(x(^): 



|G(x('=-i))||, 



C^e^) = min[C™,.,max(C^('=),7C^'-'^)], 
C^'^) = min[C™,,,max(C^(^),7- 



IGfxWl 



with a = 1.5, 7 = 0. 9, and (max = 0.8. The convergence tolerance is defined in Eq.(27). 
In this prescription, the first step ensures superlinear convergence (for a > 1), the second 
avoids volatile decreases in Cfc, and the last avoids oversolving in the last Newton iteration. 
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A further advantage of Krylov methods is that they can be preconditioned by considering 
the alternate (but equivalent) systems j('=)(p('^))~ip('=)5x*^'^'' = — G'-'^^ (right preconditioning) 
or (P^''^)~^ J^''^ 5x^^^ = — (p('^))~iG*^'^'' (left preconditioning). Such preconditioned systems 
can be straightforwardly and efficiently implemented in the Krylov algorithm as two consec- 
utive matrix- vector products. A crucial feature of preconditioning is that, while it can sub- 
stantially improve the convergence properties of the Krylov iteration if (pC^))"! ^ (J^'^))"^, 
it does not alter the solution of the Jacobian system upon convergence (because the solution 
^x*^'^) of the preconditioned system is the same as that of the original system). Therefore, 
one can explore suitable approximations in the preconditioner for efficiency purposes without 
compromising the accuracy of the converged result. 

For this study, we have employed unpreconditioned JFNK (with P*^^^ the identity oper- 
ator) to demonstrate the feasibility and the accuracy properties of our fully implicit elec- 
trostatic PIC implementation. Despite the lack of a preconditioner, we show in the next 
section that large CPU speedups over explicit approaches are still possible in ID, demon- 
strating the potential of the approach. Multidimensional applications will require effective 
preconditioning, which will be the subject of future work. 



5. Numerical examples 

To demonstrate the accuracy and performance of the present algorithm, we use three 
standard electrostatic test cases: Langmuir wave, two-stream instability, and ion acoustic 
wave. We first show that a simple CN mover (i.e., without sub-stepping) can accurately 
simulate the behavior of Langmuir waves, including cold plasma oscillations and Landau 
damping in a warm plasma. We then simulate a two-stream instability, demonstrating the 
effects of charge conservation and adaptive sub-stepping on momentum conservation. In 
the lAW case, in which large implicit time steps {upeAt ^ 1) can be taken, we discuss the 
impact of the particle mover on the accuracy of long-time integration, and demonstrate that 
moderate CPU gains of the implicit scheme vs. the explicit scheme in the kXi) <^ 1 limit 
are possible. 

In all the cases, we assume a homogeneous coUisionless electrostatic plasma equilibrium 
with some initial perturbation 



fa{x,V,t = 0) = faoiv) 



1 + a cos I ^ X 



(30) 



where /qq is the initial velocity distribution of species a, a is the perturbation level, L is 
the domain size, and rih is the mode number of the perturbation [rih = 1 by default). The 
computational domain is one-dimensional, featuring a uniform grid and periodic boundary 
conditions. All the temporal and spatial quantities are normalized by the inverse plasma 
frequency (l/wpe) and the Debye length (A^), respectively. Other parameters are provided in 
specific test cases below. To verify the simulations using the implicit algorithm, we use either 
analytical solutions, e.g., linear damping or instability rates, or results from the conventional 
explicit momentum-conserving VP PIC simulations. By default, binomial spatial smoothing 



[Eq.(20)] is used; first-order B-spline is used for interpolating current density j in the implicit 
scheme, and second-order B-spline is used for interpolating charge density p in the explicit 
scheme. 
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Figure 2: Langmuir wave in a cold plasma. Leapfrog and simple CN are used in explicit and implicit schemes 
respectively. Both schemes use a small time step {At = 0.1). The two curves depict the total electric field 
energy E"^ = J^i ^^id are very close to each other. The theoretical plasma frequency (wpe = 1) is accurately 
reproduced. 



5.1. Langmuir wave 

We first consider a cold plasma Langmuir wave, in which electrons are perturbed and 
oscillate collectively around stationary and uniform ions. Simulations are performed by 
both explicit and implicit schemes using the same set of simulation parameters: L = 2tt, 
Nx = 32, Np = 2000, At = 0.1, a = 0.01, which are the domain size, number of grid points, 
number of particles, the time step, and the perturbation level [see Eq.(30)], respectively. 
Figure (|2]), shows that the leapfrog and CN schemes agree closely with each other. In these 
simulations, we resolve well the plasma period, which is the only timescale of the problem. 

Next, we consider electron Landau damping of Langmuir waves in a warm plasma, which 
is a pure kinetic effect. Given Maxwellian electrons and no contribution from ions, the wave 
dispersion relation from linear theory is [37]: 



1 + 



1 + 



0, 



(31) 



where Z is the dispersion function of Fried and Conte. Solving Eq.(31) for a given real k 
provides a complex solution {u + ^7), in which u is the wave frequency and 7 is the wave 
damping rate. 

We initialize the problem with a perturbation of a Maxwellian distribution /eo- The 
simulation parameters are L = An, = 32, Np = 40000, a = 0.05, defined as before. 
Using k = 0.5 in Eq.(31), we find u = 1.415 and 7 = 0.154. Note that Landau damping, 
which mostly affects resonant particles with Vp ~ u/k, occurs in a longer timescale than 
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Figure 3: Landau damping test case. Top: Good agreement between the explicit and the impUcit schemes 

is seen for At = 0.1. Bottom: Accurate dainpirig rate is produced by the implicit sclieuie for At = 2 (i.e., 
WpAt = 2 and 'yAt ~ 0.3) even though the plasma frequency is underresolved. For this time step, the explicit 
scheme is CFL unstable. 
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Figure 4: Semi-log scale graph of the electric field energy for the two-stream instability, obtained with the 
simple CN mover and the ACC CN mover. Both simulations use At = 0.2. Numerical growth rates are seen 
to agree very well with the analytical one (7 = 0.5). 

plasma oscillations, so that there is a separation of timescales in this problem. We first 
consider a small time step At = 0.1, which resolves the plasma wave period. We performed 
both explicit and implicit simulations under identical conditions. Figure (|3|a compares the 
results with the analytical growth rate. We see good agreement between the two methods, 
and the numerical damping rate agrees well with the theoretical one. When we increase 
the time step to At = 2 [Fig.(|3])b] (such that the plasma period is under-resolved, but the 
damping rate is still well resolved), the explicit scheme is unstable due to the violation of the 
CFL condition. The implicit scheme, on the other hand, is stable, and accurately reproduces 
the damping rate. We note that only the simple CN mover is employed in these implicit 
simulations, which enforce no local charge conservation. The importance of local charge 
conservation will be demonstrated for longer term simulations in the following sections. 

5.2. Two-stream instability 

We consider two equal counter-streaming electron beams in a stationary and uniform ion 
background. The analytical dispersion relation is |37] 



of the electrostatic energy, showing that the numerical growth rate agrees very well with the 
analytical one with both the simple CN mover and the adaptive-charge-conserving (ACC) 
CN mover. 
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Figure 5: Comparison of the ACC CN and the simple CN particle movers. Time histories of the change in 
total energy, the root mean square of the charge continuity equation, and the total momentum are shown in 
panels a,b, and c, respectively. 
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This test case is ideally suited for exploring the properties of the ACC CN mover vs. 
simple CN. For this, the conserved quantities of the system (energy, charge, and momentum) 
are monitored using both CN movers. In Fig.(|5])a, we see that both CN movers conserve 
the total energy commensurately with the nonlinear tolerance employed [et = lO"*^), as 
expected. Figure ([sjb depicts the root mean square of the discrete charge conservation 
equation at every cell, and confirms that the ACC mover conserves charge exactly, while 
the simple CN mover results in significant charge conservation errors. Finally, Fig. Qc 
demonstrates the substantial improvement in the conservation of total momentum achieved 
by the ACC mover vs. the simple CN mover, particularly in the nonlinear regime. 

5.3. Ion acoustic wave 

Simulating the ion acoustic wave requires considering both electron and ion dynamics. 
The wave is excited by an ion density perturbation, and is propagated by the compression 
and expansion caused by the thermal motion of ions, as well as by incomplete Debye shielding 
due to the thermal motion of electrons. Adding the ion contribution to the dielectric constant 



[cf. Eq.(31)], the dispersion relation of ion acoustic wave can be written as 
1 



1+ " 



V2k \V2kJ\ 



1 + ^ V z 



V2k V V2k 



0, (33) 



where tt = T^/Ti is the ratio of electron and ion temperatures, and = mi/ me is the ratio 
of ion and electron masses. In regimes with r-r ^ 1 and ^ 1, the I AW frequency is much 
smaller than the plasma frequency [uj ^ cUpe), and the wavelength is much longer than the 
Debye length {kXr, <C 1), which make the lAW a truly multiscale problem. 

The main goals of this section are to examine the long-term accuracy of the implicit 
algorithm employing different particle movers, and to provide an accuracy and efficiency 
comparison with a standard explicit VP algorithm with a leap-frog mover. We perform 
implicit and explicit simulations under identical conditions with = 10^, = 200, a = 0.2, 
L = 10, N;j. = 32, Np = 128000. Note that electrons are much faster than ions for these 
parameters. Therefore, both stability and accuracy will be largely controlled by electron 
dynamics. 

We first consider four different implicit movers: a single-step CN mover with At = At = 
0.25 and At = At = 1, a sub-stepping CN mover with At = 4Ar = 1 (i.e., 4 particle sub- 
steps), and an adaptive charge- conserving CN mover. Simulation results with these movers 
for lAW are depicted in Fig. ([6]). Figure (|6])a depicts the evolution of the ion kinetic energy, 
and motivates several important observations. Firstly, the single-step CN with At = 1 fails 
to capture the long-term lAW behavior. This is because the average distance that an electron 
travels is larger than a cell size, namely kmaxVthe^t > 1. In this regime, Debye shielding 
is not captured accurately, as predicted by theory [30j. Decreasing the time step to ensure 
kmaxVthe^t < 1 improves the simulation results (CN with At = 0.25), as expected, albeit 
still with significant errors. Particle sub-stepping such that kmaxVtheAT < 1 also improves 
the simulation results, as is demonstrated by the sub-stepping CN run, but it is still not 
satisfactory. The implicit solve with ACC mover is the only one that is able to accurately 
capture the dynamics despite the long At employed, as will be shown later in this section. 

Figure (|6])b shows the evolution of the electron kinetic energy. The relative magnitude of 
the perturbation in the electron kinetic energy is much smaller than that in the ion kinetic 
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Figure 6: Accuracy test of the implicit algorithm using three movers: simple CN mover (cn), sub-stepped CN 
mover (sub-cn) , and adaptive-charge-conserving CN mover (acc-cn) . a) Time history of ion kinetic energy 
shows the impact of using fixed-time step movers in a long-time simulation, b) Time history of electron 
kinetic energy shows the importance of using small (sub-)time steps to capture the electron dynamics, 
which exhibits relatively small variations, c) Total energy is well conserved in all implicit simulations, d) 
Momentum conservation is substantially improved when particle orbits are accurately calculated using the 
charge-conserving self-adaptive CN mover. 
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Figure 7: Ion kinetic energy [Ki 
simulations. 



^p(imWp)i] obtained by implicit (with ACC-CN mover) and explicit 



energy, suggesting that small errors in the electron kinetics are sufficient to ruin the accuracy 
of the ion wave. 

Figure ([6])c and (|6|d show the time evolution of the conserved quantities of the system, 
namely, total energy and total momentum, respectively. Clearly, energy is conserved exactly, 
while momentum is not. Similar to the two-stream instability case [Fig.(|5])], we see significant 
improvement in the momentum conservation when exact charge conservation is enforced 
by the ACC mover. With other movers, momentum errors accumulate in time, and the 
system can develop nonphysical drifts, which eventually compromise the accuracy of the 
lAW dynamics. 

It is instructive at this point to compare the implicit ACC-CN solver with the explicit 
VP solver. For the latter, we have employed explicit time steps of At = 0.1 and At = 0.005, 
both of which resolve well the plasma frequency (i.e., the CFL condition is satisfied). The 
results of the comparison are shown in Fig.([7]). Most strikingly, only the explicit simulation 
with the smaller time step agrees well with the implicit ACC solver, despite the fact that the 
latter takes a very large time step for the field equations {At = 4, or uAt ~ 0.14, 800 times 
larger than the explicit one). This suggests that f^/j-based CFL limits, while sufficient for 
stability, may not be enough to guarantee accuracy. In fact, a fj/i-based CFL can introduce 
large orbit integration errors for fast electrons, which may accumulate in time. The only way 
to control these long-term errors is to further reduce the explicit time step to resolve the fast 
electron orbits, thus exacerbating the computational efficiency issues of explicit PIC. 

The robustness of the implicit solver with respect to the nonlinear tolerance Et for the 
various movers is examined next. This is of interest because the level of energy conservation 
depends on Et- Figure ([s]) shows the comparison of simulations using two nonlinear tolerances: 
et = 10~^ and et = 10~^. Even though the energy conservation is not exact with Et = 10~^, 
the total energy is still conserved well (with errors per time step ~ 10^^). We find that the 
ACC mover is the most robust with respect to changes in the nonlinear tolerance. This again 
stresses the point that orbit accuracy, charge-conservation, and the avoidance of particle 
tunneling are key to the overall long-term accuracy of the implicit solver. 

5.4- CPU gain of the implicit scheme vs. the explicit scheme 

Having demonstrated the accuracy of the present implicit scheme, we proceed to argue 
and demonstrate by numerical experiment that the implicit scheme can also be significantly 
more efficient than the explicit one, even without a preconditioner. 
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Figure 8: Comparison of the ion kinetic energy history using two nonUnear tolerances (10~* and 10~®) in the 
implicit simulations. Three particle movers are tested: simple CN mover (cn), sub-stepping CN mover (sub- 
CN) and adaptive-charge-conserving CN mover (ACC-CN). The cn mover and sub-cn mover show strong 
sensitivity to the nonlinear tolerance (a, b sub-panels), illustrating the effect of accumulated errors in particle 
orbits. In contrast, the acc-cn particle mover is mostly insensitive to changes in the nonlinear tolerance. 
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As has been demonstrated, the proposed energy- and charge- conserving imphcit PIC 
scheme is not hmited to the stringent stabihty constrains featured by conventional exphcit 
PIC schemes, namely, UpeAt < 1 and Ax < Xd- Instead, we can choose the time step to 
resolve the dynamical timescale of interest, which is in many cases much slower than the 
plasma frequency. Similarly, we can choose the grid size to resolve the relevant spatial scales, 
which can be much larger than the Debye length. Thus, there is significant potential for CPU 
speed-up of implicit vs. explicit PIC schemes. 

We have designed the particle integration scheme to follow orbits accurately. At first 
glance, it may appear that the cost of orbit sub-stepping would offset any potential efficiency 
gains of the implicit approach. This would certainly be the case if electron sub-time steps 
were comparable to their explicit CFL. However, sub-time steps in our implicit scheme are 
not limited by the CFL condition. Instead, sub-steps are typically At < Ax/Vp, which can 
still be much larger than the explicit CFL when Ax ^ Xd, and only become restrictive for 
very few fast particles. It follows that the implicit scheme should still able to deliver large 
CPU gains with respect to explicit, despite particle sub-stepping. 

A back-of-the-envelope estimate confirms that this is the case when kXo I (the relevant 
physics regime in many applications of interest). We begin to compute the expected CPU 
gain by estimating the CPU cost for a given PIC solver as: 



where AT is the time-span of the simulation, Up is the number of particles per cell, {L/ Ax) 
is the number of cells, d is the number of physical dimensions, and C is the computational 
complexity of the solver employed, measured in units of a standard explicit PIC VP leap- 
frogged time step. The implicit-to-explicit speedup is thus given by: 



For simplicity, we assume that all particles take a fixed sub-timestep At in the implicit 
scheme, and that the cost of one time step with the explicit PIC solver is comparable to 
that of a single implicit sub-step (a good approximation when both are dominated by the 
cost of moving particles). It follows that Cim ~ NpE (At/AT)-^, i.e., the cost of the implicit 
solver exceeds that of the explicit solver by the number of function evaluations per time step 
Nfe multiplied by the number of particle sub-steps (At/Ar)^^. The number of function 
evaluations per time step NpE is roughly equal to the sum of the number of Krylov iterations 
plus the number of Newton iterations in a given time step. If an optimal preconditioner is 
available, NpE can be made independent of problem dimension and size (see e.g. jSHHlO] for 
examples in the context of fluid modeling of plasmas). 

Assuming typical values for Arj^ ~ 0-^Ax/vth, At^x ~ O.l/wpe? Axim ~ 0.2/k, and 
Ax ex ~ Xn, we find that the CPU speedup scales as: 




(34) 




CPU, 
CPU. 



ex 



{kXnY+'NpE' 



1 1 



(35) 
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Table 1: Efficiency study of explicit vs. implicit PIC using the lAW example. We have fixed the number of 
particles per cell to 4000, and the resolution of the implicit runs to 32 mesh points. 
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32 


200 


49.6 


15.4 
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0.02 


32 


400 


67.6 


11.96 



which predicts that large gains are possible when kXf) <^ 1, particularly in multidimensional 
applications, but only if NpE is bounded. The latter point underscores the importance of 
developing suitable preconditioning strategies. This will be the subject of future work. 

We proceed to verify the scaling of Eq.(35) with the JAW example. We employ various 
values of kXo with both implicit and explicit PIC. Performance results are summarized in 
Table [Tj In all these runs, we have fixed the number of cells in the implicit runs to 32, 
and we have used a constant number of particles per cell (4000). Explicit time steps and 
resolutions are chosen for stability, not accuracy. Since the latter are much more restrictive 
(as shown earlier in this study), the results in Table [l] should be considered as a conservative 
lower bound in the efficiency gain potential. As we decrease kXo (i-e-, we increase the domain 
size), we confirm that the implicit-to-explicit CPU ratio CPUex/CPUim (last column of Table 
[T| increases linearly with the domain size. This is consistent with the prediction of Eq.(35) 
when one realizes that the number of function evaluation NpE also increases linearly with 
the domain size. This results from our unpreconditioned JFNK implementation combined 
with the fact that we have also increased the implicit time step as the domain size grows. 
Nevertheless, Table [l] shows that moderate CPU gains (up to a factor of 15 for kXo = 0.02) 
can be achieved even without a preconditioner, exemplifying the potential of the implicit 
PIC approach for efficient, kinetic plasma simulation. 



6. Conclusions 

In this study, we have undertaken the task of implementing and demonstrating the accu- 
racy and algorithmic properties of a fully implicit particle-in-cell solver for plasma simulation. 
We have focused on a ID Vlasov- Ampere electrostatic model as a proof of principle, for which 
we have developed an exactly energy-conserving, orbit-averaged discrete formulation. Ex- 
act charge conservation is achieved by an adaptive, charge-conserving, sub-stepping particle 
mover. 

The nonlinear residual is converged using an unpreconditioned Jacobian-free Newton- 
Krylov solver. A tight nonlinear tolerance is required to achieve exact energy conservation. 
Crucial to our implementation is the concept of particle enslavement, which segregates the 
particle orbit integration procedure in the evaluation of the nonlinear residual, and thus al- 
lows a completely general treatment of the particle orbits. This flexibility has been exploited 
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in this study to implement an adaptive charge-conserving particle mover, which is compatible 
with the exact discrete energy conservation theorem. Since particle quantities are not de- 
pendent variables, the JFNK solver features very modest memory requirements, comparable 
to those of fluid simulations (albeit one still needs to store old-time and new-time particle 
positions and velocities as auxiliary variables for the residual computation). Furthermore, 
being a stand-alone process, the particle orbit integration stage opens the opportunity for 
significant acceleration using dedicated hardware such as GPGPUs. 

Numerical experiments with a multiscale problem (the ion acoustic wave) have demon- 
strated the superior accuracy properties of the approach. In particular, we have demon- 
strated that large implicit time steps can be used without accuracy degradation, and that 
explicit approaches need to use time steps much smaller than the CFL to provide compara- 
ble accuracy. Efficiency-wise, we have argued for the potential to achieve large CPU gains 
of implicit methods vs. explicit ones, particularly in multidimensional applications. We 
have demonstrated moderate gains (up to 15) with our unpreconditioned Newton-Krylov 
implementation in ID. 

Albeit admittedly at a proof-of-principle level, this study has resolved a number of lim- 
itations of earlier implicit PIC studies, and has uncovered the potential of fully implicit 
methods for both accurate and efficient kinetic plasma simulation. Future work will focus 
on extending our approach to a fully electromagnetic implementation, on exploring fully 
coupled moment-PIC representations that are better suited for the development of precon- 
ditioning strategies, and on exploiting heterogeneous architectures (e.g., GPGPUs) for the 
integration of particle orbits. 
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Appendix 



Appendix A. Derivation of exact chcirge-conserving ID particle mover 

We proceed to provide a simple derivation of the ID charge- conserving scheme employed 
in this work, which differs from others reported in the literature as explained in the main 
text. As in the main text, we define the current and charge density j and p as 

ji = ^qpVpSm-l{Xp- Xi)/Ax, (A.l) 
p 

Pi+l/2 = ^QpSmiXp- Xi+i/2)/Ax, (A.2) 
P 

where qp is the particle charge, Vp is the particle velocity, Sm is shape function which is a 
B-sphne of order m. Note that the shape function used for the current density is one order 
lower than that for the charge density. In the limit of At —> 0, it is straightforward to show 
that 

dpi+l/2 _ ?p dSm{Xp - Xi+1/2) 



E 

p 



dt ^ Ax dt 

p 

Qp dSjYi dxp 



Ax dxp dt 

Qp 'S'm— l('^p -^i) Sm—l{,-^p -^i+l] 



^Ax ' ' Ax ' ' 

p 

Ji+l — Ji 



Ax ' 

where d^^Smixp - Xi+1/2) = -d^^Smixp - Xj+1/2) = {Sm-i{xp - Xi) - - Xi+i)) /Ax 

is used. With a Crank-Nicolson discretization in time we find that: 

p""*""^ — Pi _ ^ Qp Sm{Xp'^'^ — Xi) — Sfn{Xp — Xi) 



At ^ Ax At 
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Qp ^n+l/2^m-l{Xp^ ^ — Xi) — Sfn-l{Xp'^ ^ — Xj+i) 



Ax ^ Ax 
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■n+1/2 _ ■n+1/2 

Ji+l Ji 



Ax 



where we have Taylor expanded Sm{Xp'^'^ — Xi) and Sm{x", — Xi) about x^^^^'^ 



-■p 



The exact charge conservation is valid only when the particle is moving within a cell and 
m < 2, because the 2nd-order derivative terms cancel exactly, and no higher-order terms are 
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present in the expansions. This is seen as follows. 
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Subtracting Eq.(A.4) from Eq.(A.3), we find exactly that: 



dx ^ P 



(A.3) 



(A.4) 



which has been used in the derivation above. Since the second derivative of the B-spline is 
piecewise continuous only within a cell, it is required that particles stop at cell boundaries. 



Appendix B. Local error estimate of particle orbit integrator 

We proceed to outline the derivation of the local error estimate for the particle orbit 



equations, required for Eq.(19) in the main text. In what follows, we omit the subscript 
"p" in particle quantities for convenience. Let us consider the initial value problem for the 
particle orbits: 

dx 
di 

= n(f T.) i)(f = (W = 7'" 

dt 

A forward Euler's temporal discretization method gives: 



v{t,x), x{t = 0) = 
a(t,x), v{t = 0) = w° 



+ a{t^,x'')/\T. 



(B.l) 
fB.2l 



Using the forward Euler's method as the predictor in a second-order, time-centered dis- 
cretization gives: 



X 



H 



x'' + ^W,x'')+v{t^''\x''+% 



At. 



which is the so-called Heun's formula. A Taylor expansion readily shows that forward Euler 
is first-order accurate, while Heun's formula is second-order accurate: 



X 



X 



H 



(B.3) 
(B.4) 
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The local error introduced in the position by the Euler step can be readily estimated by 
subtracting these two expansions, to find j41] : 



Ar2 



/e. = [vif^\ - a;'')] Ar/2 + O(Ar^) ^ —ait", x^), (B.5) 

where we have used Eq.(B.2) for the last step. Similarly, we find the following error estimate 
for the velocity: 

le, = [a(t^+\ a:^+^) - a(r , x^)] Ar/2 + ©(Ar^) ^ ^ (^^v^ , (B.6) 



where we have Taylor-expanded the acceleration in the last step. Equations (B.5) and (B.6) 
are reported in the main text. 
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